A low-dimensional approximation of optimal confidence

Human decision making is accompanied by a sense of confidence. According to Bayesian decision theory, confidence reflects the learned probability of making a correct response, given available data (e.g., accumulated stimulus evidence and response time). Although optimal, independently learning these probabilities for all possible data combinations is computationally intractable. Here, we describe a novel model of confidence implementing a low-dimensional approximation of this optimal yet intractable solution. This model allows efficient estimation of confidence, while at the same time accounting for idiosyncrasies, different kinds of biases and deviation from the optimal probability correct. Our model dissociates confidence biases resulting from the estimate of the reliability of evidence by individuals (captured by parameter α), from confidence biases resulting from general stimulus independent under and overconfidence (captured by parameter β). We provide empirical evidence that this model accurately fits both choice data (accuracy, response time) and trial-by-trial confidence ratings simultaneously. Finally, we test and empirically validate two novel predictions of the model, namely that 1) changes in confidence can be independent of performance and 2) selectively manipulating each parameter of our model leads to distinct patterns of confidence judgments. As a tractable and flexible account of the computation of confidence, our model offers a clear framework to interpret and further resolve different forms of confidence biases.


Introduction
Decision confidence refers to a subjective feeling reflecting how confident agents feel about the accuracy of their decisions.This feeling of confidence often closely tracks the objective accuracy [1]: people usually report high confidence for correct trials and low confidence for errors.This observation is in line with the theoretical proposal that confidence reflects the Bayesian posterior probability that a decision is correct given available data [1][2][3].As such, confidence represents valuable information that is taken into account to guide adaptive behavior, including learning [4][5][6]; speed-accuracy tradeoff adjustments [7,8]; and information seeking [9].Therefore, having an accurate sense of confidence that best matches one's accuracy is of utmost importance to maintain adaptive behavior.However, estimating the Bayesian probability with limited data is computationally intractable.Additionally, empirical dissociations between confidence and accuracy are widespread, most prominently in cases of blindsight [10], change blindness [11] and anterior prefrontal lesions [12].Such dissociations pose a serious challenge for the Bayesian interpretation of confidence.In this work, we reconcile these findings by proposing and empirically validating a low-dimensional approximation to the Bayesian probability, offering both a computationally tractable and flexible model for the computation of decision confidence.
Most attempts at modeling decision confidence have done so within the context of existing models of decision making.One highly influential account is based on the idea that decision making reflects a process of noisy accumulation of evidence until a decision boundary is reached [13].For example, the drift-diffusion model (DDM) describes the decision-making process as the noisy accumulation of evidence in favor of one of two options.Here, evidence accumulates with a certain drift rate (representing the efficiency of evidence accumulation) until reaching a decision threshold, at which point a response is issued.Several approaches have been put forward to account for confidence within the DDM framework [14][15][16].The most prominent approach relies on the Bayesian interpretation of confidence, modeling it as the probability of a choice being correct given the available data.Within the drift diffusion model, the available data to participants is the amount of accumulated evidence and the time spent accumulating, which are then combined into a probability that the decision was correct [2,15].Such formalization of decision confidence is sometimes referred to as the "Bayesian readout" [17].This Bayesian readout can be represented as a heatmap on the two-dimensional (data) space formed by both evidence and time.In Fig 1A, it can be seen that the Bayesian readout hypothesis predicts that confidence will be higher for trials with more accumulated evidence (reflected on the y-axis) and lower for trials with a longer decision duration (reflected on the x-axis).Consistent with these predictions, confidence indeed depends on evidence strength [1,2] and on elapsed decision time [14].More generally, this modeling approach has been successful in explaining a wealth of data [17][18][19].
To compute confidence by reading out the probability correct given evidence and time, humans must have an accurate representation of the entire space created by crossing these two variables (i.e. the heatmap shown in Fig 1A).Previous accounts propose that individuals learn this mapping via experience [2].However, computing the exact probability would in principle require estimating it at each point of the infinite-size (evidence, time) plane.Independently learning all positions on this heatmap in this way would either take a lot of time or yield very Confidence is thought to represent the Bayesian probability of a choice being correct conditional on evidence, time and choice.Within this theory, confidence is quantified as this probability, represented by the color on the heatmap.B. Because this optimal solution is intractable, the LDC model proposes a low-dimensional parametrization of this framework, which allows efficient estimation of confidence, while accounting for idiosyncrasies and confidence biases.The LDC model can generate a heatmap representing confidence which closely approximates the optimal Bayesian probability.Values of α and β were obtained by fitting the LDC model to the Bayesian probability of being correct over 1 000 000 simulated trials.Confidence for the trial plotted on top of the heatmap is given by Eq (3).Here, confidence = .85.C-F.To show the effectiveness of the LDC model we generated statistical signatures of confidence [1] based on the Bayesian read-out of confidence (error bars reflecting SEM, simulated N = 100) and based on the LDC model fits (shaded lines reflecting SEM).High and low confidence trials in panel E were obtained by performing a median split.https://doi.org/10.1371/journal.pcbi.1012273.g001noisy estimates.Thus, tractability is a key issue that needs to be addressed in order to understand how humans learn the probability correct given evidence and time.In typical Bayesian modeling approaches, the traditional computational solution to this high-dimensional problem is some kind of function approximation [20], whereby the probability map is approximated by a (much) lower dimensional function.Following that logic, in the current work we propose the Low-Dimensional Confidence (LDC) model, a simple yet efficient cognitive algorithm that approximates the optimal yet intractable Bayesian readout.Instead of learning the probability correct independently for all points on the (evidence, time) plane, the LDC model parameterizes the mapping in a way that only two values need to be learned by a cognitive agent to approximate the Bayesian readout.In the following sections, we describe how LDC allows to tractably compute the mapping from evidence and time to confidence.Using simulated data, we show that LDC provides a close approximation of Bayesian confidence.We then proceed to test and validate our model with human data.

The low-dimensional approximation of confidence model (LDC model)
Constructing an accurate representation of confidence based on a limited number of samples is infeasible.However, under standard DDM assumptions, the probability of a correct choice given accumulated evidence and elapsed time can be expressed as the probability of drift rate v being positive in case of upper boundary hit (and conversely p(v<0) in the lower boundary hit case).Such probability is characterized as [15]: where e is the accumulated evidence, t is the elapsed time, ϕ is the cumulative distribution function of the standard normal distribution and σ is the within-trial noise of the DDM accumulator.Given that ϕ is an integral without closed-form solution, it requires an infinite number of standard operations to be computed.We propose to approximate ϕ with a more tractable logistic function (21 see S1 Text for a detailed derivation of our model): where λ�1.7 is a constant that optimizes the approximation [21].In its current form, the formalization of confidence proposed in Eq (2) cannot account for idiosyncrasies [22], diverse types of confidence biases and deviations from the optimal probability of a correct choice typically observed in empirical work [23][24][25].We identify two distinct forms of deviation from the Bayesian probability correct: sensitivity in how evidence is treated and mapped onto confidence judgments, and a general increase or decrease in confidence ratings.In order to make the formulation of confidence flexible to these distinct forms of deviation, we thus further parameterize confidence in the following way: where x2{−1; 1} is the choice.The two free parameters of this equation capture how strongly individuals weigh evidence in their computation of confidence (α); and a stimulus-independent confidence bias (β).Notably, the approximation of the optimal model in Eq (2) is retrieved from Eq (3) when a ¼ l s and β = 0.As a weighting parameter on evidence, α can be interpreted as individuals' estimate of the reliability of evidence.Intuitively, a very low α implies that during the computation of confidence participants consider the evidence as unreliable (e.g.individuals think that the stimulus is very noisy); a very high α implies that for the computation of confidence participants consider the evidence as overly reliable (e.g.individuals underestimate the amount of noise in the stimulus).In the extreme case where α = 0, the model completely ignores evidence and the computation of confidence is entirely driven by β and time.If additionally β = 0, then confidence will always be .5.At the other end of the spectrum, if α tends to infinity, then the smallest amount of evidence will lead to extreme confidence judgments (i.e.either confidence = 1 if ev > 0 or confidence = 0 if ev < 0).
A positive confidence bias (β > 0) implies that the model has a general tendency to be overconfident.If β = 0, the model is unbiased and bases its confidence purely on the evidence accumulated and the time spent accumulating.A negative confidence bias (β < 0) indicates overall underconfidence.
It is important to note that α and β are parameters pertaining only to how the readout of evidence and time is mapped onto a confidence estimate.As such, a change in these parameters would only influence confidence, and leave the decision process unaffected.In contrast, a change in DDM parameters would influence both decision and confidence For readers more familiar with SDT models of confidence, we note that similarly changes in μ and σ will influence both decision and confidence.

Simulations: The LDC model closely resembles Bayesian confidence
The aim of the current work is to provide a tractable and flexible approximation of the Bayesian readout of confidence.A first test of the LDC model is whether it can effectively approximate the Bayesian readout of confidence.For this sake, we generated data from 100 simulated participants from a range of typically observed DDM parameters.Our model was then fit to the true Bayesian posterior probability correct conditional on evidence, time and choice.LDCpredicted confidence almost perfectly correlated with the true probability of being correct (Spearman r(999998) = .99,p < .001).This close resemblance can be appreciated visually by comparing the model-based heatmap (created based on the estimated parameters; To further show that our model closely tracks the Bayesian readout of confidence, we tested its ability to reproduce statistical signatures that confidence should adhere to if it does reflect a Bayesian probability [1].[1] identified three qualitative signatures, namely: confidence predicts choice accuracy, confidence increases with evidence strength for correct trials, but decreases with evidence strength for error trials (commonly called the folded X-pattern; [26,27]), and for any level of evidence strength above 0, high confidence trials should be linked with higher accuracy than low confidence trials [1].
Additionally, it is well known that confidence is negatively associated with the speed of responding [14,16,28,29].This relationship is accounted for by the Bayesian readout of confidence under DDM assumptions (Eq (1)).The intuition behind this is that accumulation time informs on the difficulty of the decision and thus, on the probability of making a correct choice (if a lot of evidence is accumulated within a small amount of time, then one is likely to be in an easy trial, indicating high probability of making a correct choice).To test how well our model accounts for how time influences Bayesian confidence, we thus introduce a fourth signature, namely average confidence for successive reaction time (RT) bins.As can be assessed on Fig 1C -1F, the simulated data showed an excellent fit to the signatures.

Empirically testing predictions of the LDC model
Having demonstrated that the LDC model can closely approximate the Bayesian readout of confidence on synthetic data, we next turned to empirical data from human participants.We tested two key predictions of the LDC model.First, the LDC model predicts that changes in confidence can be independent of performance.The two free parameters only describe how evidence and time are combined into a confidence judgment, but they do not affect the process that leads to specific levels of accumulated evidence and elapsed time.Any manipulation that selectively targets confidence while leaving performance unaffected should thus be captured by changes in α and/or β.A second novel prediction is that selective changes in each parameter of our model should lead to distinct modulations of confidence judgments.Thus, a manipulation targeting reliability (α) should lead to qualitatively distinct changes in confidence ratings compared to a manipulation targeting confidence bias (β).
Experiment 1: The LDC model accounts for performance-independent changes in confidence.We first tested a crucial prediction of our model, namely that changes in confidence can occur independent of changes in performance [9,[30][31][32].Although such dissociations have been observed since several decades (e.g., blindsight; 10), they pose a serious challenge for most current models of confidence.The LDC model naturally accounts for such dissociations.One particularly strong dissociation was observed in our recent work [19], in which a manipulation of participants' prior belief about their ability to perform a task selectively influenced their reported levels of confidence.In Experiment 1 of that paper, participants performed three perceptual tasks consecutively, each divided into a training and a testing phase (Fig 2).During the training phase, participants received feedback about their performance every 24 trials.Although participants were told that the feedback indicated how well they performed the task compared to a reference group, in reality the feedback was made up.Within each task, feedback indicated that performance was worse than of most other participants (negative condition); that it was on average (average condition); or that it was better than of most other participants (positive condition).During the testing phase, participants no longer received feedback; instead, they rated their confidence at the end of each trial.We observed a direct influence of the feedback manipulation on confidence, with more positive feedback leading to higher confidence, F(2,47) = 16.65,p < .001,η 2 = .415,95% CI [.193, .574].Importantly, this effect of feedback on confidence was not explained by objective performance, as RT and accuracy did not change as a function of feedback (accuracy: Х 2 (2, N = 30987) = .30,p = .863,V = .003,95% CI [0, .011];RT: F(2,48) = 2.06, p = .14,η 2 = .079,95% CI [0, .235]).
We fitted the LDC model to the performance (accuracy and RT) and confidence reports in the test phase of this experiment, separately for each participant.LDC model predictions were then generated using the best fitting parameters for each individual.As can be seen in Fig 3, the LDC model provided an excellent fit to the data (see also S1 Fig for the observed and model predicted relationship between RT and confidence).Only confidence for incorrect trials at easier difficulty levels seemed to be overestimated by the model.This can be explained by the relatively small number of trials underlying these data points: 2.7% of all trials on average for the easy trial difficulty (i.e. less than 6 trials per feedback condition), and 5.3% for average trial difficulty (i.e. about 11 trials on average per feedback condition.Similar to the empirical data, feedback significantly influenced model-generated confidence ratings (F(2,48) = 9.79, p < .001,η 2 = .290,95% CI [.083,.465]), but did not influence the performance data (RT: F(2,48) = 1.19, p = .31,η 2 = .047,95% CI [0, .185];Accuracy: Х 2 (2, N = 30987) = .75,p = .69,V = .005,95% CI [0, .014]).Thus, our model was able to capture the data pattern, namely that confidence reports can be influenced independently from behavioral performance.
We next investigated the estimated parameters of the model (Fig 4).Given that feedback selectively influenced confidence ratings, we expected a significant change in the confidencespecific parameters (i.e., α or β), but no variation in the DDM parameters (non-decision time, drift rate, decision threshold).Indeed, feedback had an influence on estimated α (F(2,382) = 6.56, p = .002,η 2 = .In both experiments, participants performed three different perceptual decision-making tasks (only one shown here).Each task started with a training phase during which a different feedback manipulation was induced. A. In Experiment 1, participants received fake feedback after each training block, framed as a comparison between their performance and the performance of a reference group.B. In Experiment 2, participants additionally rated their confidence before receiving trial-by-trial feedback reflecting their probability of making the correct choice.Unknown to participants, the feedback was actually generated by the LDC model behind the curtain.To do so, the evidence accumulation process for each trial was estimated using the mean drift rate and boundary from a previous pilot session (see Methods for full details).Feedback conditions differed in the α (resp.β) value used to generate feedback in Experiment 2A (resp.Experiment 2B).C. In both experiments, after each training phase participants completed a test phase during which they no longer received feedback but rated their decision confidence after each decision.
Experiment 2: Dissociating parameter-specific effects on confidence ratings.Our next aim was to demonstrate that humans are sensitive to the specific parameterization of decision confidence proposed by the LDC framework.If confidence is computed using a low-dimensional solution, it should be possible to independently manipulate its parameters.Therefore, in a new set of two experiments, we aimed to induce selective changes in each parameter (reliability (α) or bias (β)) of the model.
The general design of both experiments was similar to Experiment 1: we manipulated the feedback during a training phase and investigated the impact of that manipulation on confidence ratings reported in a subsequent testing phase.Rather than presenting fake feedback every 24 trials, we adopted a novel approach where feedback during the training phase was presented after each trial in the form of a continuous value (Fig 2).Participants were told that this value reflected the probability that their response was correct (e.g., .8vs .4indicating that there was a high vs low probability that they just made a correct choice).Unknown to participants the exact feedback value was generated by LDC behind the curtain (see Methods for full details).Both experiments comprise a baseline condition (α = 18; β = 0) in which the feedback presented to participants reflected the model-approximated probability of a choice being correct.In Experiment 2A, the value of α that was used to generate the feedback was selectively manipulated between conditions.In addition to the baseline condition there was a minus condition where α was decreased (α = 9), and a plus condition where α was increased (α = 36).In Experiment 2B, the same procedure was used except that now the value of β was selectively manipulated between conditions (β = -1 in the minus condition and β = 1 in the plus condition).
A dissociable effect of manipulated feedback on confidence according to the parameter manipulated.Similar to Experiment 1, we expected participants in Experiment 2A and 2B to adapt how they compute confidence depending on the feedback received during the training phase.Given that the feedback was generated in the training phase by manipulating specific parameters in the LDC model, we expected that participants would learn to compute confidence using the same parameters settings in training and subsequent test phase.As previously described, the reliability parameter α reflects how strongly individuals weigh evidence in their computation of confidence.Given that accuracy is closely related to the amount of available evidence, correct trials tend to have considerable supporting evidence when reporting confidence, whereas error trials usually have little to no supporting evidence.Given that α weighs evidence, a decrease (in the α-minus condition) or an increase (in the α-plus condition) of α is therefore expected to differently impact confidence for correct trials (strong influence) than for error trials (little to no influence).In contrast, the parameter β reflects a stimulus-independent confidence bias, so providing participants with β-manipulated feedback is expected to lead to changes in confidence irrespective of choice accuracy.The reasoning for this prediction is that β is not concerned with the evidence provided by the stimulus (nor by the response), as it simply adds (in the β-plus condition) or subtracts (in the β-minus condition) a constant to the (logit of the) confidence judgment regardless of what happens during the trial.
LDC model fits.We next performed model comparison to explore whether the different patterns of confidence ratings observed in Experiment 2A and 2B would be best explained by a change in the targeted parameter (i.e. a change in α in Experiment 2A and a change in β in Experiment 2B).Two candidate LDC models were fit to the accuracy, RT and confidence data of both experiments.Each model differed in whether α or β was fixed between feedback conditions: in the α-free model, only α was allowed to vary between feedback conditions, whereas in the β-free model, only β was allowed to vary between feedback conditions.As recommended in [33], we investigated how well simulations from the best-fitting parameters from both the α-free and the β-free models were able to reproduce the observed behavioral effects.Specifically, we defined a confidence contrast that captured the qualitative signatures seen in the feedback presented.Since the difference in feedback between the baseline and the plus condition was negligible relative to how both conditions differed from the minus condition in both experiments, we computed our confidence contrast as average confidence in the minus condition subtracted from average confidence in the baseline and the plus condition.Fig 5E and 5F show the empirical confidence contrast as well as the distribution of the mean predicted confidence contrast for both the α-free and the β-free model obtained via bootstrapping.In Experiment 2A, the confidence contrasts predicted by both the α-free and the β-free model was highly similar for correct trials, and both matched well to the empirical data.However, while the α-free model closely captured the confidence contrast in errors and hence the interaction, the β-free model overestimated the effect in errors, which led it to underestimate the interaction.Similarly, in Experiment 2B, both models accurately captured the empirical confidence contrast in correct trials.Additionally, the β-free model precisely reproduced both the empirical confidence contrast in error trials and the interaction, whereas the α-free model clearly underestimated the confidence contrast in error trials, which led to predicting an interaction that was not present in the empirical data.We conclude that the α-free (resp.β-free) model fits best to the experiment where α (resp.β) was manipulated.
To further confirm that the α-free (resp.β-free) model is the most likely to explain the results of Experiment 2A (resp.2B), we additionally quantified the goodness-of-fit of each model using Bayesian information criterion (BIC).Four additional candidate models were A key prediction of the LDC model is that participants should be sensitive to the specific parametrization of confidence proposed by the model.To test this, Experiment 2 provided participants with probabilistic feedback generated by the LDC model.Critically, LDC based feedback was generated using different levels of α or different levels of β. A. Changing α influences the confidence for correct trials but not for errors.B. Changing β influences feedback for both corrects and errors.The pattern that we saw in the feedback (which effectively are our predictions) was also seen in the behavioral data.C. α-manipulated feedback influenced confidence reports for correct but not error trials.D. β-manipulated feedback influenced confidence reports on both correct and error trials.E. Fitting the LDC model to the empirical data of Experiment 2 revealed that data in the α-manipulated feedback was best explained by a model in which α was allowed to vary.F. Data from the β-manipulated feedback was best explained by a model in which β was allowed to vary.To visualize this, we computed confidence contrasts for the empirical data (black lines), as well as for the α-free (yellow distribution) and β-free (blue distribution) model fit, separately for corrects and errors."Interaction" refers to the difference between the confidence contrast in corrects and errors.G. Parameter estimates from the α-free model.Estimated α was higher when feedback was generated by higher α.H. Parameter estimates from the β-free model.Estimated β was higher when feedback was generated by higher β.As a reference, the values of included in that analysis.To ensure that the feedback effects were best captured by a change in one parameter only, we included a null model where neither α nor β varied between feedback conditions and a full model where both α and β were free to vary between conditions.To investigate whether a pure Bayesian readout could capture such data, we also included two models computing the exact probability of making a correct choice given accumulated evidence and time, based on the estimated drift rates.In the first Bayesian readout model, the drift rates were fixed between conditions (just like with the other LDC candidate models).Importantly, the estimated DDM parameters for the LDC models and this Bayesian readout model were identical.These models therefore only differ in how inputs from post-decisional DDM (i.e.post-decision evidence and post-decision RT) are converted into confidence, and can only be distinguished on the goodness of their fit to confidence ratings.Since fixing drift rates between feedback condition effectively prevents the Bayesian readout model from returning any difference in confidence between conditions, we included a second Bayesian readout model where drift rates were allowed to vary between feedback conditions.Table 1 reports the difference in mean BIC across participants of each candidate model compared to the best model, separately for both experiments.A first conclusion that can be drawn, is that all LDC models considerably outperformed the Bayesian readout models.Even the Null LDC model, effectively blind to the feedback effects on confidence, showed better BIC than the Bayesian readout model with varying drift rates between feedback conditions (in theory able to account for feedback effects).Second, both the α-free and β-free model outperformed the null model (i.e., providing strong evidence for a change in the parameters) as well as the full model (i.e., providing strong evidence for a selective change in the parameters).Third, as expected the αfree model showed the lowest BIC for the data of Experiment 2A.Surprisingly though, the αfree model also slightly outperformed the β-free model in Experiment 2B.Overall, the difference in BIC between the α-free and the β-free models appears marginal compared to how strongly they each outperformed the null and full models.Additionally, the difference in BIC between the α-free and the β-free models was bigger in Experiment 2A (Δ BIC = 4.15), where the α-free model was expected to be the best performing model, compared to the difference observed in Experiment 2B (Δ BIC = 2.54).Applying categorical cutoffs to describe the magnitude of the evidence in favor of the α-free model in both experiments, such as the rule of thumb proposed by [34], leads to conclude that the α-free model has considerably more support than the β-free model in Experiment 2A, but only weak support in Experiment 2B.
We also did group-level Bayesian model selection to determine which model better accounts for the data [35,36].Using BIC weights as model evidence, we performed two different analyses: in the first one, we included all four LDC candidate models.Second, since the αfree and β-free models did not differ much in mean BIC and both clearly outperformed the other candidate models, we also looked at the results of group-level Bayesian model selection with only these two models included in the analysis.In this analysis framework, candidate models are treated as random effects with fixed (unknown) distribution in the population (i.e. the data of all participants is no longer assumed to originate from the same model).Therefore in the following results, we reason in terms of model frequency within the population.The Bayesian Omnibus Risk (BOR) was consistently low when all four LDC candidate models were included (Experiment 2A: BOR = .005,Experiment 2B: BOR = .027),suggesting that one model was more frequent than the others.We next looked at the maximum Exceedance Probability (maxEP) to further investigate which model was found to be more frequent than the others.Consistent with the results from mean BIC, the α-free model had the highest EP in both experiments (Experiment 2A: maxEP = .996,Experiment 2B, maxEP = .987).Interestingly, when only the α-free and β-free models were included in the analysis, the BORs were much higher in both experiments (Experiment 2A: BOR = .365,Experiment 2B: BOR = .253),indicating that the models have a significant chance of being equally frequent.This last result further highlights the difficulty of drawing a clear conclusion from such small differences in mean BIC between both models.In light of this, and following the recommendations of [33], it appears more relevant to select the winning model according to its ability to reproduce the observed behavioral effect of interest (namely, the confidence contrasts).Still, it remains an open question for future work to determine why exactly the α-free model obtained a relatively better fit overall.Lastly, to verify that the model-validated effects observed in Experiment 2A and 2B are caused by increased α (resp.β) values in the conditions where feedback was generated from higher α (resp.β) values, we looked at the estimated α and β from the α-free model fits in Experiment 2A and the β-free model fits in Experiment 2B (Fig 5G and 5H).As expected, there was an effect of feedback condition on estimated α in Experiment 2A, F(2,318) = 16.56,p < .001,η 2 = .09,95% CI [.04, .16],with lower estimated α in the minus condition compared to the control and plus conditions (ps < .001),but no significant difference between the control and the conditions (p = .55).Similarly in Experiment 2B, estimated β was found to be different across feedback conditions, F(2,270) = 26.44,p < .001,η 2 = .16,95% CI [.09, .24],with again lower estimated β in the minus condition compared to the control and plus conditions (ps < .001)but no difference between the control and the plus conditions (p = .99).Taken together, these results suggest that theoretically motivated confidence manipulations can lead to specific and theoretically predicted changes in confidence.
Experiment 3: Feedback manipulations influence confidence in a transfer task.In Experiment 1 and 2, trials in the training and testing phase were identical (i.e.same task, same difficulty levels).Therefore, a final potential concern is that instead of changing their mapping of confidence (i.e.changed their α and β), participants might instead have learned to give a specific confidence rating for the trials they were trained on.We resolved this ambiguity by testing whether the effect of feedback can also be observed when trials in the testing phase are perceptually different from the trials in the training phase.In Experiment 3, participants were constantly trained on either the letter discrimination task or the dot color task where they received β-manipulated feedback (i.e. the β-minus and β-plus conditions of Experiment 2B) and then tested on either the same task (Same condition) or a different task (Transfer condition).

Discussion
How to incorporate the sense of confidence in models of decision-making has been the focus of much recent work.An influential framework is based on the Bayesian interpretation of confidence [3,[37][38][39], namely that confidence reflects the probability of being correct given both accumulated evidence and elapsed time [14,15,17].In order to accurately compute this probability, it is necessary to know how to compute confidence based on the agent's available data (evidence and time).Currently, a computationally plausible account describing how individuals learn this mapping is lacking.In the current work, we introduced the LDC model, which provides a tractable and flexible account of decision confidence.Using simulations, we first showed that LDC provides a highly reliable approximation of the true probability correct.Fitting this model to empirical data revealed that LDC accounts very well for human confidence ratings.Critically, using a novel feedback manipulation, we validated two key predictions from the model, namely that changes in confidence can be independent of performance, and that independently manipulating the reliability (α) and bias (β) parameters elicit clearly dissociable and identifiable effects on confidence.

Introducing tractability and flexibility to decision confidence modelling
The LDC model belongs to the family of DDM-based models of decision confidence.Here, confidence is conceptualized as a (Bayesian) readout of the probability of a correct choice given evidence, time and choice.Existing models following that approach have been successful in explaining a wealth of data, including the link between confidence and RT [14,17], and deviations from accuracy through the contribution of priors [18,19].Estimating the probability correct based on the available data, however, is computationally intractable.The LDC model therefore proposes to approximate the Bayesian readout with a logistic function, offering a tractable approach of how humans compute confidence.Importantly, even though it describes an approximation to an optimal solution, the LDC model follows the same principles as the optimal Bayesian readout and uses the same information to compute confidence.It thus fundamentally differs from heuristic models that describe the computation of confidence as based on performance-unrelated cues [40].On another note, while the LDC model solves the problem of estimating confidence for each point on the (evidence, time) plane, it leaves open how agents know about the appropriate parameter settings (i.e.how they learn the correct mapping).Our behavioral results suggest that the parameters might be learned from external feedback, this perspective should be investigated further in future works by looking at the dynamics of this potential learning process.
To increase flexibility and account for deviations from optimality, the LDC model relies on two free parameters, which control the reliability of evidence (α) and bias (β) in the computation of confidence.A different class of confidence models that can account for biases and deviations between confidence and accuracy is based on SDT framework [41][42][43][44][45][46].For that purpose, these models typically either assume the existence of metacognitive noise [43][44][45], and/or consider that confidence is not entirely derived from the same signal as the primary decision [41][42][43][44]46].A recent study comparing the different SDT models of confidence on simple perceptual tasks proposed that confidence is simply computed as a noisy readout of the evidence used for the primary decision [47].Although the LDC model is grounded within the DDM tradition which conceptualizes confidence as the Bayesian probability correct, it does not critically hinge upon the specifics of the DDM.It would be straightforward to construct a simplified version of the LDC model which ignores the element of time.This would allow to directly compare the LDC approach to recent SDT models of confidence.Crucially, with its parameters, our model can flexibly account for the different types of idiosyncrasies, biases and deviation from the optimal Bayesian readout [22][23][24][25], which are all merged into a single metacognitive noise parameter in most SDT frameworks.

Confidence can vary independently from task performance
In all Experiments, we observed that decision confidence was influenced by the feedback manipulation, whereas objective performance was not.This finding rules out an interpretation whereby the feedback influenced task performance, and changes in confidence simply reflect this change in performance.Indeed, some previous work has shown that changes in confidence can be explained by subtle differences in RT [14,48].This was not the case in the current experiments because both accuracy and RT were not influenced by the feedback manipulations.As such, it is unlikely that pure Bayesian read-out models can account for the effects observed in the current work, as they do not allow for confidence-specific parameters [14][15][16].In contrast, LDC accurately captured the effect of feedback on confidence in the absence of changes in objective performance, thus attesting to the flexible nature of the LDC model.Previous studies have unraveled several other factors that influence the reported level of decision confidence, while leaving task performance unaffected, for example emotional states [30,49], working memory content [32] and age [50,51].Besides, dissociations between performance and metacognition have long been reported in cases such as blindsight [10,52], where individuals with lesions in primary visual cortex show above chance level performance at visual tasks despite reporting no awareness of the stimuli.The opposite pattern of low performance linked with high confidence has also been observed.Change blindness (i.e.failure to detect major differences between two images while they flicker off and on) is a typical example of such metacognitive error where individuals believe they would be able to detect major changes, despite being unable to do so [11].These examples highlight how ubiquitous dissociations between performance and metacognition are.By incorporating free parameters controlling for evidence reliability and bias into the computation of confidence, the LDC model is in principle flexible enough to account for such dissociations.

Humans can independently tune evidence reliability and bias in confidence
In Experiment 2A and 2B, we aimed to selectively manipulate confidence ratings according to each parameter of the LDC.By providing model-generated feedback from different α's in Experiment 2A and different β's in Experiment 2B, we revealed clearly distinct patterns of confidence ratings according to the parameter manipulated.Moreover, the empirically observed patterns were best captured by models where the manipulated parameter was set as a free parameter (e.g.α-free model when feedback was α-manipulated).One might argue that the magnitude of the observed effects of feedback on confidence was on a significantly smaller scale than the actual differences in feedback between conditions.Similarly, the across conditions difference between the best-fitting parameters was on a much smaller scale than the differences in feedback-generating values.This mainly shows that participants do not come to the experiments with a blank slate, and do not use external feedback only to adjust their estimation of confidence.Instead, participants have their own prior beliefs on their performance at the start of a new task, and partially use external feedback in combination with internallygathered information about the task to update how they compute confidence (similar to how individuals integrate both internal and external forms of feedback to adjust their performance, [53]).Given the number of trials in the training phase used to induce a change in confidence (120), and the subtlety of the feedback manipulations, it was actually non-trivial that any change in the fitted values of α and β was observed.
These results imply that individuals can change their computation of confidence consistently with our parameterization of confidence, providing strong validating evidence in favor of LDC.This observation raises the intriguing possibility that individuals might exert control over the parameters governing the computation of confidence in a way that maximizes utility.Intuitively, computing confidence in such a way that it closely matches the Bayesian readout seems like the rational strategy to optimize utility, as it would allow to optimize behavior based on the best possible internal evaluation of that behavior [5,7,9].In some contexts, however, other factors than informativeness play a role in the utility of confidence.When competing for shared limited resources, expressing overconfidence plays a key role in convincing other agents not to compete for the resource (i.e."bluffing"; [54,55]).Errors caused by overconfidence, though, bear a high cost in such strategy.In such a context, the optimal way to compute confidence seems to be an increase in the evidence reliability estimate (α), which will lead to higher confidence for scenarios with much evidence (i.e., overconfidence when you are likely to win the competition) but lower confidence for scenarios with little evidence (i.e., when you are likely to lose the competition).Increasing β in this scenario is likely suboptimal because this produces overall high confidence, also for scenarios with little evidence.The opposite scenario might be true in a social decision-making context.If confidence is used to assert influence rather than to convey accuracy [56], the optimal strategy might be an overall increase in β, resulting in general overconfidence (i.e.irrespective of accuracy) to push forward one's choice.These examples show that what is traditionally treated as deviations from the optimal Bayesian readout can sometimes be considered as optimal through the lenses of utility maximization.

Beyond dichotomies with model-informed feedback
In contrast with the binary "correct/error" feedback typically provided in lab experiments, feedback received in daily life is not always clear-cut.Individuals must often make sense of noisy and probabilistic feedback cues (e.g.how should a street-artist interpret a subtle nod from a spectator?).Continuous feedback has been used in the past to communicate performance relative to other (hypothetical) participants [19,57] or to give average accuracy over several past trials [58,59].However, in the current work we designed a novel feedback manipulation which provides continuous feedback about choice accuracy on a trial-by-trial basis.It is important to note that our instructions simply stated that feedback would reflect the probability of being correct on a single trial, without much more explanation as to how this proportion was calculated.A skeptical participant could reasonably doubt the trustworthiness of the feedback, since it might seem unlikely that we provide an "accurate" probability of being correct on a single trial basis (e.g. is a feedback of 80% vs 70% really informative, or are the values pure noise added by the experimenter).Despite these potential obstacles, our feedback manipulation did produce the confidence patterns we predicted, hence validating our modelgenerated feedback approach.This nuanced way of providing feedback goes beyond the mere distinction between dichotomous valid versus invalid feedback [60], and offers a promising framework to control the level of ambiguity and informativeness of trial-by-trial feedback, allowing to study in a more fine-grained manner how individuals process and are impacted by more realistic, ambiguous feedback [61,62].

Interpreting the LDC parameters
An appealing property of computational models is that their parameters often have clear interpretations, and can be selectively manipulated [13,63], although it is subject of recent debate [64].Similarly, in LDC, evidence and time are mapped onto confidence by means of a reliability parameter, α, and a confidence bias parameter, β.Our reliability parameter, α, can be interpreted as an individual's estimate of the precision of evidence.This interpretation is similar to the recently proposed concept of "meta-uncertainty", which is described as "the subject's uncertainty about the uncertainty of the variable that informs their decision" [65].In both the LDC model and [65]'s CASANDRE model, one's estimate of evidence reliability weighs how evidence is used to compute confidence.Note that an important difference is that in CASANDRE the estimate is assumed to be correct on average (i.e.individuals are assumed to have an uncertain, but on average correct estimate of evidence reliability), whereas one of the key points of the LDC model is that participants can have incorrect values of α.
The second parameter of LDC, β, globally increases or decreases confidence.Interestingly, β is scaled by 1= ffi ffi t p . Another possibility for our model could be to have a time-independent confidence bias parameter.Such hypothetical parameter would become more important in the computation of confidence with time relative to evidence, which is scaled by 1= ffi ffi t p .In contrast, β reflects the same constant shift in confidence for all accumulation time.It therefore straightforwardly relates to the metacognitive bias described in other models of confidence that ignore RTs [66].It is interesting to note that β has no lower boundary, and therefore in theory it allows for cases where agents with a very low (negative) β would constantly judge that they made an error.While this scenario may seem irrational, some individuals may indeed be so insecure about their decisions that they would often rate their confidence below the guess threshold.It remains unclear whether such behavior occurs empirically.For now, we simply note that the LDC model can in principle account for it, potentially opening new doors to understand metacognitive impairments in future work [23][24][25]45].
In light of this interpretation of α and β, one can further interpret specific patterns in the data.For example, in Experiment 1, we observed a change in α in response to negative feedback (with a significantly lower estimated α compared to the other two conditions), indicating that participants judged evidence as less reliable after receiving negative feedback.On the contrary, we observed a change in β after positive feedback (with a significantly higher estimated β compared to the other two conditions), suggesting a general overconfidence bias after receiving positive feedback.This dissociation suggests that despite similar effects at the behavioral level, the LDC model allows to further tease apart the origins of confidence biases e.g. in response to positive vs negative feedback.It will be interesting to investigate in future work how experimental factors influencing confidence can be differentially explained in terms of evidence reliability (α) or general under-or underconfidence (β).
Finally, we note that in the current parameterization of confidence, identical to the Bayesian readout, confidence always depends on ffi ffi t p .However, the influence of time on confidence might vary according to the task or individual.To account for such hypothetical sources of variability, one could expand the LDC model by further parameterizing the influence of time with a third parameter, γ, and replace ffi ffi t p in Eq (3) with t γ .The model then has an accurate calibration of how time influences confidence when γ = 0.5, and overweighs (resp.underweighs) time in the computation of confidence when γ>0.5 (resp.γ<0.5).By doing so, future work might investigate whether variability in the relation between confidence and decision time can be captured by the extended LDC model.

Conclusion
We introduced the LDC model, a new model of decision confidence that offers a tractable and flexible approximation of confidence as the Bayesian probability of making the correct decision.The model provides a low-dimensional parametrization of decision confidence which allows efficient estimation of confidence, while at the same time accounting for idiosyncrasies and different kinds of confidence biases.This parameterization of confidence was validated in two experiments showing a distinct pattern of confidence ratings after specifically manipulating the mapping according to each parameter of the model.

Experiment 1
Ethics statement.All procedures were approved by the Katholieke Universiteit Leuven (KU Leuven) ethics committee.
Participants.Fifty participants (eight men, one third gender, age: M = 19, SD = 4.9, range 17-52) took part in Experiment 1 (two excluded due to chance level performance).All participants participated in return for course credit and read and signed a written informed consent at the start of the experiment.Detailed methods and analyses for Experiment 1 have already been reported in [19].We briefly report the general procedure here.
Procedure.Participants completed three decision-making tasks: a dot color task, a dot number task and a letter discrimination task.Each task started with 120 training trials.Feedback during training was presented at the end of blocks of 24 trials.Unknown to participants, feedback was predetermined to be either good, average or bad for a specific task, and feedback scores were randomly sampled according to the feedback condition.Each participant received good feedback on one task, average feedback on another task, and bad feedback on a third task (order and mapping with tasks counterbalanced between participants).After the training phase of a task, participants performed 216 test trials where feedback was no longer provided.Instead, confidence ratings were queried at the end of each trial.For each task, there were three levels of stimulus difficulty (easy, average, or hard).Dot color task.On each trial, participants decided whether a box contained more (static) blue or red dots.The total number of dots was always 80, with differing proportions of red or blue dots depending on the difficulty condition.The position of dots was randomly generated on each trial.Dot number task.On each trial, two boxes were presented, one of which contained 50 dots and the other more or less than 50 dots.Participants decided which of the two fields contained the largest number of dots.The exact number of dots in the variable field differed depending on the difficulty condition.The position of dots was randomly generated on each trial.
Letter discrimination task.On each trial, participants decided whether a field contained more X's or O's.The total number of X's and O's was always 80, with differing proportions of X's or O's depending on the difficulty condition.The position of the letters was randomly generated on each trial.

Experiment 2
Ethics statement.All procedures were approved by the KU Leuven ethics committee.Participants.Forty-three participants (8 men, 35 women, age: M = 18.49,SD = 1.03, range [16][17][18][19][20][21][22] took part in Experiment 2A.Forty-two participants (9 men, age: M = 18.83,SD = 2.05, range 17-29) took part in Experiment 2B.Due to chance performance on at least one of the tasks, we removed 3 participants from Experiment 2A and 3 participants from Experiment 2B.Five additional participants were removed from Experiment 2B due to (almost) no variability in their confidence reports (i.e.used the same report on more than 90% of the trials).Two participants passed our a priori inclusion criteria but showed behavior that could indicate a misunderstanding of the confidence scale (i.e.chance-level performance or higher for all confidence ratings).Removing these two participants from the analysis did not significantly change the behavioral results.All participants took part in return for course credit and signed informed consent at the start of the experiment.
Stimuli and apparatus.All experiments were conducted on 22-inch DELL monitors with a 60 Hz refresh rate, using PsychoPy3 [67].All stimuli were presented on a black background centered on the middle of the screen (radius 2.49˚visual arc).Stimuli for the dot number task were presented in two equally sized boxes (height 20˚, width 18˚) at an equal distance from the center of the screen.Stimuli for all other tasks were presented in one box (height 22˚, width 22˚).
Procedure.In both experiments, participants completed three decision-making tasks: a dot color task, a shape discrimination task and a letter discrimination task.Each task started with 108 training trials.After each choice, participants rated their confidence level and then received (continuous) feedback about their performance.After the training phase of a task, a test phase of 216 trials followed which was identical to the training phase, except that feedback was omitted.Every trial was assigned one of three possible difficulty levels.The difficulty levels were matched between the three tasks based on a pilot staircase session.For all tasks, a trial started with a fixation cross that was presented for 500 ms, after which the stimulus appeared for 200 ms or until a response was given.Participants indicated their choice using the C or N key using the thumbs of both hands.There was no time limit for responding, although participants were instructed to respond as fast and accurately as possible.After each choice, participants rated their confidence on a 6-point scale, labeled from left to right: 'sure error', 'probably error', 'guess error', 'guess correct', 'probably correct', and 'sure correct' (reversed order for half the participants).Confidence was indicated using the 1, 2, 3, 8, 9 and 0 keys at the top of the keyboard with the ring, middle and index fingers of both hands.There was no time limit for indicating confidence.During the training phase only, a trial ended with a visual presentation of feedback.An empty horizontal rectangle was filled in white starting from the left end of the rectangle (reversed order for half the participants, matched to the confidence counterbalancing).The proportion filled corresponded to the probability that the response was correct (e.g.halfway filled if feedback is 50%).Ticks at the 0, 25, 50, 75 and 100 percent marks were respectively labeled 'sure error', 'probably error', 'random chance', 'probably correct' and 'sure correct'.
On each trial, participants decided whether a box contained more elements from one out of two categories.In the letter discrimination task, elements were A's or B's, in the dot color task, blue or red dots and in the shape discrimination task, squares and circles.The total number of elements in a box was always 80, with the exact proportion of each element depending on the difficulty condition.The position of the elements was randomly generated on each trial.

Experiment 3
Participants.Thirty-four participants (12 men, age: M = 19.5,SD = 3.28, range 18-32) took part in both sessions of Experiment 3.All participants took part in return for course credit and signed informed consent at the start of the experiment.All procedures were approved by the KU Leuven ethics committee.
Procedure.Similar to the previous experiments, Experiment 3 was divided into a training phase (120 trials where participants received model-generated feedback) and a subsequent testing phase (180 trials where no feedback was provided).Crucially, participants were always trained on the same task and then tested on the same task (Same condition) and on a different task (Transfer condition, randomized order between participants).The testing tasks were the letter discrimination task and the dot color task.Half of the participants were trained on the letter discrimination task while the other half was trained on the dot color task.Feedback was manipulated according to β in two within-participants conditions, identical to Experiment 2B's minus and plus conditions.The experiment was organized in two sessions of 1 hour, separated by 6 to 8 days.A different feedback condition was assigned to each session (randomized order across participants).There were two training and testing phases for each feedback and transfer condition, for a total of 240 training trials and 360 testing trials per feedback and transfer condition.
Model-generated feedback.Instead of binary feedback (correct/error), feedback during the training phase after each trial was provided in the form of a continuous value.Participants were told that this probability reflected the probability that their response was correct.In reality, the feedback was generated by our model of confidence.To do so, we estimated the singletrial evidence accumulation process online (i.e., during the experiment).To do so, we assumed that performance was equivalent to the average performance observed in piloting sessions.In other words, we assumed that the current decision threshold and drift rate were equal to the average decision threshold and drift rate from piloting sessions.At the moment a decision was made, the evidence accumulation process just reached the decision threshold.We thus inferred that the amount of accumulated evidence at the time of decision was equal to the average decision threshold estimated from the pilot sessions.Then, to estimate the total amount of accumulated evidence at the time of the confidence report, we added the post-decisional evidence estimated by running a random-walk for a duration fixed to the observed confidence RT and with a drift rate set to the average drift rate estimated from the pilot sessions (the sign of which varied whether the response was correct or not).Feedback was thus equal to model confidence computed according to a fixed (α, β) pairing (the value of which depended on the condition and experiment one is in) from that total evidence and the total time (decision RT + confidence RT).
Feedback conditions.In a baseline condition, the feedback presented to participants reflected the actual model-generated probability of a choice being correct.To get the value of α and β that best approximate the true probability of a choice being correct, we estimated both parameters based on the heatmap generated by the drift rates observed in the pilot sessions.In the baseline condition, α was thus set to 18 and β to 0. In Experiment 2A, for one task feedback was computed using a lower value of α (namely 9), and for another task feedback was computed using a higher value of α (namely 36; termed "α-plus" condition).The association between the manipulation of α and the task was counterbalanced across participants.In Experiment 2B, feedback was provided according to the baseline condition in one task, using a lower value of β in another task (-1), and using a higher value of β in another task (1).
Statistical analyses.All data were analyzed using mixed effects models.We started from models including the fixed factors and their interaction(s), as well as a random intercept for each participant.These models were then extended by adding random slopes, only when this significantly improved model fit.Confidence ratings and RT were analyzed with linear mixed effects models, for which we report F statistics and the degrees of freedom as estimated by Satterthwaite's approximation.Accuracy was analyzed using a generalized linear mixed model, for which we report Х 2 statistics.We computed Cramer's V as effect sizes for Chi-squared tests [68].All model fit analyses were done using the lmerTest R package [69].
Bounded evidence accumulation.We modeled choice and RT data using the drift diffusion model (DDM), a popular variant of the wider class of accumulation-to-bound models.In the DDM, noisy evidence (representing the difference between the evidence for both options) is accumulated, the strength of which is controlled by a drift rate v, until one of two decision thresholds a or -a is reached.Non-decision components are captured by a non-decision time parameter ter.To simulate data from the model, random walks were used as a discrete approximation of the continuous diffusion process of the drift diffusion model.Each simulated random walk process started at z*a (here, z was an unbiased starting point fixed to 0).At each time step τ, accumulated evidence changed by Δ with Δ given in Eq (4): Within-trial variability is given by σ.In all simulations, τ was set to 1 ms, and σ was fixed to .1.Model fitting.Model predictions were obtained from the random-walk simulation described above.Evidence continued to accumulate after threshold crossing for a duration that was sampled from the confidence RT distribution of the trials being fitted.Note that this sampling was done without replacement, ensuring that the simulated confidence RT distribution exactly matched the empirically observed confidence RT distribution.The number of trials being simulated was equal to 20 times the number of empirical trials being fitted to ensure that every trial of the empirical confidence RT distribution is being simulated an equal amount of time.Given that the model-generated confidence comes on a continuous scale from 0 to 1, we binned the model output into 6 equally-spaced bins.
Accuracy and RT data of each task and participant was estimated using 5 DDM parameters: non-decision time, decision threshold and three drift rate parameters (one for each trial difficulty level).Additionally, α and β were fitted to the confidence judgments, separately for each feedback condition.We implemented quantile optimization, and computed the proportion of trials falling within each of six groups formed by quantiles .1,.3,.5, .7 and .9 of RT, separately for corrects and errors.Similarly with confidence ratings, we computed the proportion of trials resulting at each of the 6 levels of confidence judgment separately for corrects and errors.The resulting objective function consisted in minimizing the sum of squared errors described in Eq (5): with N q = N cl = 6 the number of RT groups/possible confidence value, oRT i,k and pRT i,k respectively the proportions of observed and predicted trials falling within quantile i of RT, separately for corrects (k = 1) and errors (k = 0), and oCJ i,k and pCJ i,k reflecting their counterpart for confidence.Models were fitted using a differential evolution algorithm [70], as implemented in the DEoptim R package [71].The population size was set to 10 times the number of parameters to estimate.The algorithm stopped once no improvement of the objective function was observed for the last 100 generations.Model comparison.All candidate models for the model comparison were based on the same estimated DDM parameters fitted separately to accuracy and RT data (i.e.minimizing the first term of the SSE in Eq (5)).Each candidate model was then fit to confidence ratings (i.e.minimizing the second term of the SSE in Eq (5)).BIC values for model comparison were computed as follows: with k the number of free parameters and n the number of data points.This formulation of BIC holds true assuming normally distributed model errors with zero mean [72].BIC values for each model represented in Table 1 correspond to the mean BIC over participants.Bootstrapped 95% confidence intervals of confidence contrasts were obtained by simulating 500 datasets based on the fits of each participant and then computing the mean confidence contrasts of each repetition.The 95% confidence interval was computed as the .025and .975quantiles of the distribution formed by the bootstrapping.Parameter recovery.To make sure that the estimated parameters from our model fits are interpretable, we performed a parameter recovery analysis that we report here.We simulated data from 200 simulated participants using parameters from the ranges observed in both Experiment 1 and 2. In order to reproduce the experimental settings, each simulated participant had 3 different trial difficulty levels (i.e.drift rates).We simulated 648 trials per participant (216 per drift rate).Recovery for all parameters was excellent (all rs > .93).Additionally, there was no significant correlation between estimated α and β, r(198) = .07,p = .30,suggesting that the two parameters are not trading off one against the other.

Fig 1 .
Fig 1. A. Confidence is thought to represent the Bayesian probability of a choice being correct conditional on evidence, time and choice.Within this theory, confidence is quantified as this probability, represented by the color on the heatmap.B. Because this optimal solution is intractable, the LDC model proposes a low-dimensional parametrization of this framework, which allows efficient estimation of confidence, while accounting for idiosyncrasies and confidence biases.The LDC model can generate a heatmap representing confidence which closely approximates the optimal Bayesian probability.Values of α and β were obtained by fitting the LDC model to the Bayesian probability of being correct over 1 000 000 simulated trials.Confidence for the trial plotted on top of the heatmap is given by Eq (3).Here, confidence = .85.C-F.To show the effectiveness of the LDC model we generated statistical signatures of confidence[1] based on the Bayesian read-out of confidence (error bars reflecting SEM, simulated N = 100) and based on the LDC model fits (shaded lines reflecting SEM).High and low confidence trials in panel E were obtained by performing a median split.
Fig 1B) to the heatmap based on the simulations (Fig 1A).

Fig 2 .
Fig 2. Experimental design.In both experiments, participants performed three different perceptual decision-making tasks (only one shown here).Each task started with a training phase during which a different feedback manipulation was induced. A. In Experiment 1, participants received fake feedback after each training block, framed as a comparison between their performance and the performance of a reference group.B. In Experiment 2, participants additionally rated their confidence before receiving trial-by-trial feedback reflecting their probability of making the correct choice.Unknown to participants, the feedback was actually generated by the LDC model behind the curtain.To do so, the evidence accumulation process for each trial was estimated using the mean drift rate and boundary from a previous pilot session (see Methods for full details).Feedback conditions differed in the α (resp.β) value used to generate feedback in Experiment 2A (resp.Experiment 2B).C. In both experiments, after each training phase participants completed a test phase during which they no longer received feedback but rated their decision confidence after each decision.

Fig 3 .
Fig 3. A key prediction of the LDC model is that confidence can vary independent from task performance.A-B.In Experiment 1, providing participants with fake feedback telling them their performance was better, equal or worse than a reference group indeed left RT (A) and accuracy (B) unaffected.C. On the other hand, fake feedback selectively influenced the reported level of confidence on correct trials.These results were closely captured by fitting the LDC model to these data.Note: Solid lines represent empirical data.Error bars represent standard error of the mean.Shades represent standard error of the mean of predictions of the LDC model.n.s.= p > .05;*** = p < .001.Significance indicators are about the effect of feedback conditions.https://doi.org/10.1371/journal.pcbi.1012273.g003

Fig 4 .
Fig 4. Best fitting parameters from model fits of Experiment 1. A-B The feedback manipulation influenced both α and β, with more positive feedback eliciting higher α and β.C-E The DDM parameters were not influenced by the feedback manipulation, except for a small significant effect on the bound.Grey dotted lines refer to individual fits.Note: Black lines are mean values.Error bars are SEM.https://doi.org/10.1371/journal.pcbi.1012273.g004

Fig 5 .
Fig 5.A key prediction of the LDC model is that participants should be sensitive to the specific parametrization of confidence proposed by the model.To test this, Experiment 2 provided participants with probabilistic feedback generated by the LDC model.Critically, LDC based feedback was generated using different levels of α or different levels of β. A. Changing α influences the confidence for correct trials but not for errors.B. Changing β influences feedback for both corrects and errors.The pattern that we saw in the feedback (which effectively are our predictions) was also seen in the behavioral data.C. α-manipulated feedback influenced confidence reports for correct but not error trials.D. β-manipulated feedback influenced confidence reports on both correct and error trials.E. Fitting the LDC model to the empirical data of Experiment 2 revealed that data in the α-manipulated feedback was best explained by a model in which α was allowed to vary.F. Data from the β-manipulated feedback was best explained by a model in which β was allowed to vary.To visualize this, we computed confidence contrasts for the empirical data (black lines), as well as for the α-free (yellow distribution) and β-free (blue distribution) model fit, separately for corrects and errors."Interaction" refers to the difference between the confidence contrast in corrects and errors.G. Parameter estimates from the α-free model.Estimated α was higher when feedback was generated by higher α.H. Parameter estimates from the β-free model.Estimated β was higher when feedback was generated by higher β.As a reference, the values of α and β used to generate feedback are added as crosses in panels G and H. Note: grey dots and lines refer to individual estimates.Black dots correspond to sample means, distributions correspond to the bootstrapped mean predicted confidence contrasts.Error bars and shaded areas represent empirical and model-simulated SEM, respectively.n.s.= p > .05;* = p < .05;*** = p < .001 Fig 5.A key prediction of the LDC model is that participants should be sensitive to the specific parametrization of confidence proposed by the model.To test this, Experiment 2 provided participants with probabilistic feedback generated by the LDC model.Critically, LDC based feedback was generated using different levels of α or different levels of β. A. Changing α influences the confidence for correct trials but not for errors.B. Changing β influences feedback for both corrects and errors.The pattern that we saw in the feedback (which effectively are our predictions) was also seen in the behavioral data.C. α-manipulated feedback influenced confidence reports for correct but not error trials.D. β-manipulated feedback influenced confidence reports on both correct and error trials.E. Fitting the LDC model to the empirical data of Experiment 2 revealed that data in the α-manipulated feedback was best explained by a model in which α was allowed to vary.F. Data from the β-manipulated feedback was best explained by a model in which β was allowed to vary.To visualize this, we computed confidence contrasts for the empirical data (black lines), as well as for the α-free (yellow distribution) and β-free (blue distribution) model fit, separately for corrects and errors."Interaction" refers to the difference between the confidence contrast in corrects and errors.G. Parameter estimates from the α-free model.Estimated α was higher when feedback was generated by higher α.H. Parameter estimates from the β-free model.Estimated β was higher when feedback was generated by higher β.As a reference, the values of α and β used to generate feedback are added as crosses in panels G and H. Note: grey dots and lines refer to individual estimates.Black dots correspond to sample means, distributions correspond to the bootstrapped mean predicted confidence contrasts.Error bars and shaded areas represent empirical and model-simulated SEM, respectively.n.s.= p > .05;* = p < .05;*** = p < .001https://doi.org/10.1371/journal.pcbi.1012273.g005